DOCUMEKT RESUME 



ED 262 091 



TM 850 572 



AUTHOR 
TITLE 

INSTITUTION 

REPORT ^NO 

PUBmTE 

NOTE 

PUB .TYPE 

EDRS PRICE 
DESCRIPTORS 



LDENTIFIERS 



Jansen, Hans P. * . 

Training Emphasis Task Factor Dataf^ Methods of 
Analysis. \ 

Air Force .Human Resources Lgib. , Brooks AFB, Tex. 
Manpower and Personnel Div. 

AFHRL-TR-84-50 . ^ , 

May 85 , • ^ ^ 

26p. - * 
rReportS - Research/Technical ( 143.) 

MF01/PC02 Plus Postage., " / - \ 

*Cluster Analysis; Computer Software; Evaluation \ 
Methods.; *Factor Analysi s ; *Instructional ' 
Development ;j^* Interrater Reliability; * Job Analysis; 
Measlirameht Techniques; *Military Training; Keeds 
Assessment; Rat ing Scales ; Sample Size; Skill 
Analysis ^ 
Air Forc^; *REXALL (Computer Software) 



ABSTRACT ^ . . 

-The Air Force Occupational Measurement Center 
conducts task-~based occupational surveys of Air Force specialties 
that include supervi sor rat ings on recommended training emphasis for 
entry-level ai rmen Prior i ties . are ^ input to the Instructional System 
Development training mcrael, which guides the development^ and revision 
of specialty training' courses. For, 20 percent' of specialties, p 
training emphasis ratings have been subject to poor interrater 
agreement. Data may contain conflicting rating policies within a 
specialty. -To develop a methodology for identifying multiple rating 
policies in such data, this research investigates: (1) the variation 
in interrater agreement with respect* to sample siie; and (2) the 
.multipi-^— ^^at4ti^ policy hypothesis via^ modified RECALL analysis', 
c'luster analysis, and factor analysis'. Agreement, is found to vary ^ 
within and across sample sizes, and a minimum Ocf; 55 raters is 
r^ommended. REXALL analyses are inconclusive with respect to 
confirming the presence or absence of'multipXe irating policies. 
Results indicate that samples of training emphasis ratings are less 
complex than expected. REXALL analyses are. reepmtnended for single 
ladder specialties; principal components factor s analys i s with 
VARIMAX rotation^is recommended for multiple factors — extracting one 
and then multiple factors as^ appropriate . Interpretation of these 
results can be enhaificed with CODAP auxiliary programs.* .(LMO) 



* Reproductions supplied by EDRS are the best that can be made* * 

* " from the original document. * 



AFHRL-TR-84-50 



AIR FORCE # 

H 
U 
M 
A 
N 



o 

rvi 
rvi 

CD 



R 
E 
S 
0 
U 
R 
C 
E 
S 



TRAINING EMPHASIS TASK FACTOR DATA:,^ 
METHODS OF ANALYSIS 



■ -. ■ By 

Hans P. Jansen i 

Squadron Leader, Royal Australian Air Force- Exchange Officer 

V 

■ ' ,1 t 

^ MANPOWER AND PERSONNEL DIVISION 

Brooks Air Force Base, Texas 78235-5601 ' 



May 1985 

Final Report 



Approved for public release; distribution unlimited. 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENtET^ (ERIC)." 



U.S. DEPARTMENT OFtDUpATtON 

NATIONAL INSTITUTE OF EDUCATION 
EDUCATIOfciAL RESOURCES INFgfRMATION 

CENTER (ERIC) 
V This document has been reproduced as 
recerVed from the person or organization 
originating it. 

Minor changes havn beeri made to tmprovc 
reprodur:tton quality. 

• Points of view or opinigns stated m this docu 
ment do not necessarily represertt official NIE 
positKjn or policy 



LABOR/tTORY 



AIR FORCE SYSTEMS COMMAND 

•BBOOKS AIR FORCE BASE, TEXAS 78235-5601 



BEST COPY AVAILABLE 



NOTICE 

When U,S» Government drawings, speclrtcaMons, or other data are used for 
any purpose other than In connection with a,, definitely Governfflent-.related 
p»;ocurement, the United States Government Incurs no responsibility or any 
obligation whatsoever* The fact that the Go^rnment may have formulated 
or In any way supplied the sa'ld drawings, 'specifications, or other data,. 
Is not .to be regarded by Implication, or otherw^lse in any manner 
'construe'd, as licensing the holder, or atiy ottier person or corporation;* or 
as conveying any rights or p,erjn1ssion to manufacture, use, or ^ sell any 
patented invention that may^ in any way be related^thereto. 

The Public Affairs Office has reviewed this report, and it is releasable 
to the National Technical Information Service, where it will be available 
tjo the general public, including foreign nationals, ^ 

This repopt? has been reviewed and is approved for publication. 

L 

WILLIAM E. ALLEY, Technical Director ' / 

Manpower and Personnel Division 

ANTHONY F, BRONZO, JR^, Colonel, USAF 
iC'ommander 



. ^ v.. 

Unclassified* 



SECURITY CLASSIFICATIQIN OF THIS PAGE 



1. REPOR.T S&C^URlTY CLASSIFICATION ' . " : 

Unclassified \ , ' ' /. 


It^RESTRICTlve MARKINGS • ^ , . 


2». SECUfliTY CUASSIFICATION AUTHORITY 


n i«;t p * R I jT inN/A VAl I ABILITY OF REPORT* 

Approved for public release; distribution unlimited. 


2b. DECLASSlFICATION/DOWNGRADING Sqt^^DULE 


4. PERFORMING ORGANIZATION REPORT NUMi&ERiS) 

AFHRL-TR-84-50 

' . . . ; . 


5. MONi;rORING ORGANIZATION REPORT NUMBEH(S) 
• 

— . 


6a. N^ME OF PERFORMING O RGANII ZAT 1 ON 

Manpower and Personnel Division . 


6b. OFFjfcE SYMBOL 
(If applicable) 
AFHRL/MO 


7m. name of MONITORJNG ORGANIZATION 


Qc. AOORE.SS (City, state and y^IP Code) i 

Air Force Human Resources Laboratory 
Brooks Air Force Base, Texas 78235-5601 


y^.^OORE^S(C.ty,Staie and ZrP Code) . 


8*. NAME OF FUNOING/SPONSOJpiNG- 
OFfGANIZATION 

Air. Force Human Resources Laboratory 


8b. OFF ICE SYMBOL 
(If applicable) 

HQ AFURL 


9. PROCUREMENT INSTRUMENT I DENTIT 1 CAT 1 ON NUMBER 

^ ' ■ 


Sc. ^DDf\ESS (City. StcPte^and ZIP Code) 

Brooks Air Force Base, Texas 78235-5601 / 


10. SOURCE OF FUNDING KPS. * ' 


PROGRAM 
ELEMENT NO, 

62703F 
62703F 


PROJECT 
\ NO. 

7719 
7734 


TASK 
, NO. 

19 
07 


WORK UNIT 
/ NO. 

11, 01 
30v 


11. TITLE (Include Security Clauificaiion) 

Training Emphasis Ta$k F^tor Data: Methods <)f Analysis 


12. PERSONAL AUTHOR(S) . ' . . . _^ 

Jansen, Hans ?• * _ 



REPORT DOCUMENT ATJON PAGE 



13«. TYPE OF REPORT 

Final 



13b. TIME COVERED 
FROM TO . 



14. DATE OF REPORT (Yr., Mo,. Day) 

. May 1985. ^ 



15. PAGE COUNT 

28 



16. SUPPLEMENTARY NOTATION- 





17. CpSATI CODES^ 


18. SUBJECT TERMS (Continue on reverse if necessary and identify by block number) 

common rating policy ^ interrater^ reliability 
Comprehensive Occupational Data rating policies 
Analysis Programs (CODAP) REXALL computer program 


FIELD 


GROUP 


SUB, GR. 















19 ABSTRACT (Continue on reverse if necessary and identify by block number) 

REXALL, a pr^ogrtra^ within the Comprehensive Occupational Data Analysis Programs (CODAP) system, 1 Sellout inely , 
use(i for assessing the level of interrater agreement obtaiJied when multiple raters evaluate "training emphasis" 
at the task level and for extracting a reliable common rating^ policy CCRP). For* some, samples, very poor. 
Interrater agreement precludes extraction of , a reliable CRP and limits use of the data. Since^poor interrater 
agreement may be a function of multiple rating-'pplicies, research was initiated to develop a methodology for 
identifying the multiple rating perceptions ^that may exist witiiin task factor data. The fifidings presented 
Inplude the effect of sample size on interrater agreement and the use <rof modified REX^ALL analysis, cluster 
analysis and factoj analysis techniques for I'dentifying mult-iple rat ing \ol tc4es In training emphasis data. 
Results Indicate that REXALL an^alysis employing new CRP extraction criteria is adequate for samples' where the 
CRP includes all raters and wf*^' the CRP has a.divergendr of less than 25^. It was also found that , pr 1 rjcipal . 
components factor analysis has high utility for identifying the CRP and any other rating pollcttes that might 
exis't in the rating data. Possible causes of poor interrater agreement and several alternative approaches to 
Identifying .the. causes for interrater disagreement are dis'^ssed. Guidelines for occupational analysts to 
follow when using' REXALL anc^ alternative analysis procedures are provided 



20. DISTRIBUTION/AVAILABILITY OF ABSTRACT 
UNCLASSIFIED/UNLIMITED SAME AS RPT. □ DTIC USERS □ 



22a. NAME OF RESPONSIBLE INDIVIDUAL 

Nancy A. Perrlgo 
a * f . STINFO Office 



21. ABSTRACT SECURITY CLASSIFICATION 



22b. TELEPHONE NUMBER 
(Include^Area Code) 

^ (512) 556-3877 



22c. OFFICE SYMBOL 



AFljRL/TSR 



ERJC RM 1473, 83 APR 



EDITION OF I^AN 73*IS OBSOLETE, 



Unclassified 



SECURITY CLASSIFICATION OF THIS PAGE 



SUMMARY . . 

■ ■ I . . . ■ . ■ I 

■ ■ • .' , ., /■ ' ■ ■ 

The Alf Force Occupational Mejisurement Center (USAEJDMC) conducts occupational surveys of Air 
Force specialties* Thesfr 1^nclu<fe the col lection' of supervlsqtrs* [ r«t1n9,s on task factors such as 
recommended emphasis , for flrst-terat tra^nlijVr The ^tasV-tralnlng emphasis ratings serve as Input 
to th<t Instructl'^nal System Development (ISD) training^ mode.l, which gu^ldes the development ^ind 
revision of technical training courses. Analysis of training emphasis ratings Is usually 
fierformed using REXALL, a special-purpose program within j^h^ Comprehensive Oc^cuiiatlona] * Data 
Analysis Programs (CODAP), system. Two jmportant functions of REXAll are- to bssess the overall 
l^vel of agreement tmon^ raters, and to calculate an average ' (mean) factor rating /or eiach. task. 
.When an acceptable level of Interrater agreement Is attained, the -task means are r*nk-ordered. 
This rank-ordering constitutes the rec^omrtiended priority of training for. bach of the tasks and 
defines the common rating policy. (CRP) for the specialty. ^ • , 

* t» 

For a s^alV number of specialties, rrferred to as "complex specialties/ very poor 
Interrater agreement Is frequently found t,hat precludes the extraction' of ^ reliable training 
emphasis CSP*.*" Driven by the suggestion that poor interr&ter agreement may be caused by competing' 
rating pollclts with possible relevance to tralnlg^, a Request for Personnel Research (RPR) was 
initiated by USAF OMC and validated through Hq Air Jralnlng Comitiand. The RPR Requested 
devtlopfflint 'of a methodology for Identifying multiple rating policies ^hat might exist . In such 
data. ^ * , ^ ^ ^ ' " 

rlv. . 

Research on the possible causes Sf poor interrater agreement followed two wain courses: (a) 
Investigation of ' the variation In Interrater ^greesient with-respect to the- number of raters' used 
(jample size) and (b) Investlgition of v.he mult1ple-rat1ng*pol1cy hypothesis via three 
Independent analysis techniques: iriodlfled* Rf;XALL analysis, fcluster analysis, and factor 
analysis. Theie techniques were applied to sjeyen "complete specialties' to see if multiple rating 
.policies could be Identified. ' . " ' ' 

Interrater agreement was; found to vary within and across dj^ferent sample sizes. A sample 
of approximately raters Is the' minimum number recommended ^or extraction of a reliable CRP. 
REXALL analyses were Inconclusive with respect to confirming the presence or absence of multiple 
rating policies. Cluster \naly^es using existing CODAP software also proved to' be generally 
Inadequate for identifying mumple ratlng-^pollcies* However, some CODAP programs that report 
rater^responses in clustering (KPATH) sequence were found to be highly useful f or ' irtferpreting 
observed REXALL statistici. ♦ 

Results of principal components factX)r analyses clearly demonstrated thAt the samples of 
training emphasis ratings were less complex than expected. A one-factor Solution c»pnf1rme/l that 
REXALL analyses which* emplpy modified CRP extraction criteria are appropriate and •sufficient for 
sing Ie-spac1i1ty simples which contain a dominant CR^^ Whtfre such REXALL analysis failled, 
additional an^ilysis using a YARIMAX ' rotatjon/fictor-bulldlng methodology successfully Isolated 
'significantly different multiple rating policies. ^ „ 

It is recommended that REXALt analyses with modified (JkP extraction criteria be used for thV 
vast, majority of siagle'-ladder specialties, where one might expect a single dominant training 

.policy. In those^cases when ev'idence suggrfsts^ that multiple policies might be operative,. 

► principal components factors 'analysis with -VARIMAX* rotation is recommended--extr«ct1ng one \and 
then multiple/factors as appropriate. IntepretVtio.n of these results can^be enhanced with QO^AP 
auxiliary profarams (DdVARS, PRTDIS, PRTVAR and FACPRT). ' ' 



BE^T COPY AVAlLABLt 



Preface ' ' [' 

This w^rk resulted fro* Request for Personnel Research (RPRJ 79-1, Analysis of Ratings 
by Occu^/itlonal Task Factors; from Headquart'ers Air Training Co«»and, and was Initiated 
finder Work Unit 773407:^0$ toiiple)f Specialties Task 'Training, Priority Equation 
' bevelx)piBent. • It was ^ subsequently completed -under Work Unit 771919Tl» Measurene^it, and^ 
Analysis of Job and Mission Requirements. « The present effort represents a portion of the' 
Laboratory's FoVce Acquisition and Distribution System thrust. 



Dr. wniiam Alley and dr. Hendrick Ruck prov.1ded helpful suggestions and significant 



"assistance in the conduct of His effort* 



' , " . ■ . . ■ « .' ■ 

' - TABLE OF CONTENTS , * ' . 

- ^ . * • Page 
I. BACKGROUND . . . • < 5 ^ ^ 

• II.. findings' . . . • ^. . . 4 . • • '^^ • ^. . . . ^ 

SawpMng Vanlailons. * . • . . . \ 8 1 ^ 

; " Detecting MuUlpJe l^atlng Policies ^ 11 c ^ 

fjodlfjjsd REXALL Analysis. . . . • ^. H 

Cluster Ana?y^1l5., . . . . i . . . 13 

' Factor Analysis .U 

III^ APPLICATIONS . cr.'^. t ' ^ . . .19 

. " IV,'* •CONCLUSIONS ^ \. . ^ .21 

REFERENCES . . . \ . . ... . . . . . .21 

_ K Appendix A: COD*P Clustering Description r .23 

LIST OF FIGURES . - . * ' ^ 

Flgur« V « 

1 ^Slngle-speclalty rating policy domain. . .\ ....... ^ ... . . 8 \ 

. i ■ . , " ■ ^ 

2 Stability of R]] versus sawple size 11 



j 



ERIC 



■7 



•- . ' ■ ^ ■ 

< LIST OF TABLES ' r = 

Table * . - 

■ • ^-'-^^ * 

1 . Training Emphasis Data Samples Analyzed With All Raters Inc^^uded . ........ . . 9 

2 Training Emphasis Data Ajialyzed With Sequential Removal of DIvergent Raters.' . .... .10 

% >3 Variation In R]] with Sample Size : .^]0 

4 Frequency of Occurreri£e_Cl R«ter Correlations ^ 12 

■ ■ ... ■ . . 

5 Percentage of Occurrence of Rater Correlations 13 

6 Analysis Results for tiie Gener-al Factor (CRP) for Each* Specialty \- • • . .15 

7 1:omptr1son of General Factor (CRP) and Second Iteratlon^Deletlon 

Statistics for Each Specialty. 16 

v ° 

8 General and Rated Factor Statistics for AFSC 404X0 . . . .18 

9 Rotated Factor Solution for AFSC 404X0 . . . ;18 

10 Rotated Factor Solution for AFSC 328XX ^ . . 19 



8 



TRAINING EMPHASIS TASK FACTOR DATA: METHODS OF 'ANALYSIS . 



I. BACKGROUND 



' -The -Air Force' Occupational Measurement Venter (USAFOMQ?) cond^ts task-based occupational 
surveys of A1r Force specialties. _ These surveys Include the (r(511ect^Qn of supervisors'^ratlngs 
on task factors such as .recommended'trainlng emphasis,. Recommended training eaphasis is beflned 
as the emphasis that should be given In structured trainintf of the t^«k for entry-level airmen, 
regardless of where that training takes place (I.e.. resident coursC Field Training Detachment, 
or on-the-job training)., First-term training priorities are input to the Instructional System 
Developme-nt (ISD) training model, which 'giTides the" developmeqt and revision of specialty training 
course?. The utility, reliability, and validity of training emphases ratings in terms of ISD 
theory'have been demonstrated -by Ruck. Thompson. Browrr. and Stacy (in preparation). 

For approximately 20X of spec;ialties, training e^phjsis ratings have betr quite difficult to. 
Interpret, due to poor interrater agreement. Tiie suggestion has been th4t,the data for such a 
■complex specialty may contain conflicting rating policies aligned with the various employment 
duties/areas within a specialty. Currently, there are- no satisfactory operational techniques for 
identifying suCh multiple policies. Research to develop a methodology for identifying the 
various rat1r>^ perceptions- that may exist in training emphasis ratings was initiated as a "result 
"of a Request for Personnel .Research (RPR 79-1), Analysis of Ratings by Occupationa^l' T.^sk Factors, 
submitted by Headquarters Air Training Command. 



Analysis - of training emphasis rating data is usually performed, using REXALL. a 
special-purpose program developed and documented by -Christal and Weissmuller (1976) within the 
Comprehensive Occupational Data Analysis Programs (pODAP) system. The three main functions of 
REXALL are (a) ta assess the level of Interrater agreement, (b) to identify divergent raters, and 
(c) to calculate the 'mean factor rating for each task. With respect to overall interrater 
agreement. REXALL is designed to cope with a sample of raters who are anticipated to be ^ 
relatively homogeneous in termj of their rating ability. « 

Ratings for first-term training emph'asis are made usitKf a 9-po1nt scale: from I (e?(tremely 
low) to 9 (extremely high). However, the Instruction to "rate only tasks which you believe, 
require training , for first-termers" recognizes the validity of a zero rating. By default, alf 
non-rati'ngs are interpreted to mean "no training . recommended" and are Included ,as zeros in all 
REXALh calculations, including the mean training Emphasis for each task. 

' t >• 

As a measure of Interrater agreement .. REXALL^'computes two indices- of, interrater reliability 
using the intraclas? correlation .formulas reported by Lindquist (T953), The two indices are 
R,, single-rater reliability, which approximates the average of all possible pair-wise rater 
corVelat-ions; and R^k. reliability for a'-sample of k raters,- which is the expected correlation 
between' the set of observed sample task means and the task means of an hypothetical equivalent 
sample. R^'s'and R^^'s meeting^ or exceeding minimum criterion values are Interpreted as 
meaning that sufficient interrater agreement exists to produce stable estimates of^task mean 
values. _ ■ 

. The standard REXALL analysis procedure for achieving acceptable Interrater agreement and a 
set of reliable task mean ratings is to Identify and delete divergent raters, as discussed by 
Goody (1976). Diverge'nt raters are those whose ratings differ significantly from the ratings of 
t-he majority of raters Because of failure to follow instructions, Inverted^or poor discriminative 
use of the rating scale, unique perception of tasks, or lack of knowledge. TheSe divergent rater 
characteristics are reflected by a low or negative correlation between the individual rater's set 
■of ratings and the sample task mean? (excluding the subject rater's ratings), and/or a low 



ERIC 



multlplf rating policies. The first of 
observing the effects an R]] of repeated 
The remaining factors were Investigated 



t'-value (confidence level assotlAted with the correlation being different from zer'o). A typical 
rater sample Is assumed to have a simple structure consisting of a- majority of good raters who 
yield « ^et of stable task mea.ns and a minority of dly^ergent raters who Individually disagree 
with the majority rating pattern. For determining training emphasis, the rank-ordered task means 
computed from the ratings of the residual good raters constitute the recommended training 
priority and define the comnlon rating policy (CRP). 

The REXALL program provides no Information as to why, for some speqlaltles, R]] remains low 
even after successive deletions of divergent rater^. * The rationale underly,1ng the present effort 
Is that for such specialties, a low R|^ jnay be a function of co^ifllctlng multiple rat*1ng 
policies, each associated with a subgroup of, raters sharing similar^ trj\1n1ng percept Ions * al Igned 
•with a specific employment area~ within the specialty. If this Is the case, then 'the mean 
ratings^ across- a total specialty sample, may not reflect any meaningful policy, arid significant 
policy differences may be obscured by the averaging process. * ^ . 

The present study was aimed at developing a technique to Identify and describe such different 
policies which, when present, may account for the low Interrater reliabilities obtained for some 
specialties. In des Ignlng *the' approach. It was recognized tbat oth|sr factors may also contribute 
to low Interrater agreement. Five factors. In all, \J[ert regarded ai possible sources of error: 
tft) random sampling var lance, ,( b) multi-ladder task lists, (c) random varlat^^on In rater 
responses, (d) presence of divergent raters, and. («) 
these, random sampling var Innc^, wasf^ Investigated by* 
samplings Involving different numbers of raters, 
employing modified REXALL analysis, CODAP clusterf arvalysis, and factor analysis. These 
techniques are described under "Findings.* The paragraphs that follow discuss five possible 
causes of low R]] . ^ 

r 

1. Random sampling variance, a function of sample size, was considered to be a potentially 
significant ca^use^ of low Interrater agreement. The average opej^atlonal training emphasis sample 
size Is 45 supervisory raters, with a range ''of 10 to SO 'raters, The sample size Is prlmafl^ly a 
function of supervisory rater availability. Stttfst4ca11y, there Is a greater chance of 
obtaining an unrepresentative sample with abnormally low (or high) Interrater. agr'eement for the 
smaller samples. The relationship between sample size and tfie Interrater reliability Indices, 
R]] and Is algebraically summanlzed by the Spearnian-Brown prophecy -formula. In general 
terms, 1t states that Rj^j^ Increases as R^n and sample size. Increase. The criterion minimum 
for acceptable single rater l^el lability, R^] -.20, Is obtained from this formula by the 
Insertion, of R)^|^' ■ .90 as a widely recognized criterion minimum for stably, task means, and a 
sample size ^of approx1mately._40 raters which 1$ regarded as sufficiently large to be stable. 
Estimation of this minimum sample size assumes the level of Inttrrater agreement and basis for 
agreement (rating policy) within the sample reflect^^ that of the parent population. To address 
the Issue of the' stability of R^j^ as a function .of sample size, two large, single-specialty 
rater samples were 'taken as independent finite populations, and 100 subsamples for each of 12 
sample-size points in the 10- to 100-rater range were randomly selected and assessed for level of 
single-rater reliability (R]]). The results are provided In the "Findings* section of this 
report. ^ • ' * . 

2. Where more than one specialty Ms surveyed with a single comprehensive survey Instrument 
(I.e., for multi-ladder task* lists), a low R^^ may be attributable to .conflicting 
specialty-aligned Interests with little or no common training recomntnded. REXALL analysis would 
obviously be Inappropriate under this condition. Analysis results of e dual-spiiclalty sample, 
both In combined form and as two slngla specialties^ are Included" In the Investigation of 
multiple rating policies. 



I'D 



3. Random variation In rater responses 'may occur where most raters disagree due to their 
hIgMy Individual Interpretations of the task list and/or rating scale. This represents the 
extreme multlple-ratlng-policy condition. Althou^itK the research approach* taken here uses cluster 
and factor analyses as primary methods, ^n understanding of hotf Interrater agreement Is assessed, 
and how ratlny policies are examined using .existing techn1t;ues Is 1n order. Being the primary 
'ratings analysis* tool readily . aval lable In CODAP, REXALL 1$ normally used for-'analyses. of all 
ratings. 

4. The presence of divergent raters may serve to depress 'Interrater agreement. Existing 
REXALL pracedures for 'extracting a , rel lable • CRP Involve the Initial deletion of the divergent 
raters (pass 1) and, If necessary, deletion of any newly Identified divergent- raters (pass 2). 
Divergent raters are ellm.lnated from the 'sample to achieve stable estimates of task means. 
Consistently observed Increases In R]] and^^ R^i^ ^ resulting from the deletion of divergent 
raters In operational samples support this procediire and contribute to the face validity of the 
following USAFOMC CRP erxtractlon criteria for trailing emphasis: ' (a) minimum acceptable level of 
Interrater agreement, R]] « .20, R|^k " -90^ (b| minimum acceptable" rater correlation with 
mean, r « ^30 and/or t-value « 3.0; [c) delet^lon 'boundaries - maximum' of two deletion passes, 
maxImunTtof 10% raters^deleted; and (d) minimum number of good ratet^s, 40. Complex specialties 
are defined as those whose training emphasis ratings fall to provide a reliable CRP via 
ap'pllcatlon of thes^ procedures and criteria. However, the presence of an 'Inordinate number of 
divergent raters may disguise, an underlying CRP Uo an extent which renders existing CRP 
extraction criteria unsuitable. If, on the other^and, excessive rater divergence Is viewed not 
as a distinction between good and poor rtiters, but as an Indicator of multiple rating policies, 
then the fifth factor comes Into play. This factor assumes the adequacy of the listed CRP 
extraction criteria for small or moderate divergence and^ assumes complexity to be attributable to 
competing'* rating policies when' Interrater agreement ^and divergence criteria are not met. It Is 
Important to note that the multiple rating policy condition does not preclude the possibility of 
tr^CRP which Is. not readily discernible via standard REXALL analysis nor the existence of 
divergent rioters. . ^-^ ' 

^ S. Multiple rating policies can be , defined /Tf^erms of differences In the rank-ordering of 
tasks between various paired subgroups of ratelrs. A Spearma/i rank-order correlation with an 
r < .50 was taken as l-ndlcatlng a practical difference In the recommended training priority 
"between any two rating policy gr^jups. These differences may be attributed^ to any comblnatlon^of 
differences In number, type, *and level of tasks recommended. .The greatest possible difference 
between any two policies Is that they recommend totally different sets of tasks for training. 
Relatively small policy differences would result from minor variation .In the level of 
recommendations on the same set of tasks. In relation to meaningful alternative training 
policies. It woald be highly desirable for raters within significantly different rating policy 
'groups to share a common background characteristic such as job title or major command (MAJCOM), 
which could be viewed as explanatory factors contributing to policy differences. 

The postulated single-specialty rating policy domain Is summV^1ze<i ^n Figure T. The simple 
or complex specialty classification corresponds to achievement ot nonachlevement of a reliable 
CrP employing the previously described standard REXALL analysis procedure and criteria. The 
multi-ladder sample type Is not Included In Figure 1 since this type Is obviously predisposed to 
being complex and 1s, therefore, unsuitable for REXALL analysis. 



.11 



SINGLE- SPECIALTY SAMPLE 



REXALL 
Current Criteria 



Achievement of 
Reliable CRP , 



1 



SIMPLE SPECIALTY 

CRP (includes \all raters) 
CRP divergency 10% competing 



Nonachievement . 
of Reliable CRP 



COMPLEX SPECIALTY 

CRP divergency lOX 
Two or more 
policies (no CRP) 
No main policies 



Figure ]• Single-specialty rating policy doHaln. 



In the current Investigation various analytical techniques were tested with training ewphtsis 
data from six specialties. iOetails for the six training ^phasis data $ets analyzed in this 
study are sunwarized in Table 1. * The first two data sets wer« obtained frcn USAFOMC as exaMples 
of cowplex specialties with very poor interrater agreement. ^ The third USAFOMC data set, a 
two-career-ladder study, was analyzed* b(^h^ in the combined fon and as two single-specialty 
samples. The remaining two data sets were/^r specialties deewed complex as a consequence of the 
AFHRL training emphasis Equation ^ study pRuck et al,, in preparation). Application of standard 
criteria for deletion of divergent naters produces levels of 1nterr^ter agreement as per Table 
2. All samples fail to qualify &s a simple specialty under strict application of the 10% maximum - 
deletion criterion. Ho';*ever, the relatively high levels of Interrater agreement for ^FSCs 328X0, 
326X1, and 672X2 suggest the spe*c1alt1es# to be simple rather than" complex. Atta1nment\of minimum 
interrater agreement with a relatively high deletion percentage for AFSCs\811X0 and 30^0 render 
them possible complex specialtiies. The small AFSC 404X0 sample and the dual-specialty AFSC 328XX 
sample are complex. 

■ ■ * *" 

II. FINDINGS 

The findings presented [Certain to the investigations of sampling errd^ and multiple rating 
policies as possible causes of poor Interrater agreement. * 

Sampling Variations 

Two specialties, 304X4 and 672X2, were selected as probable complex specialties and rating 
data were collected from especially large samples of raters to permit analysis vof sample size 
effects. Table 3 details the variation' in R]] at three sample sizes (10, 50, and 100 raters) 
for the two specialties. In each case, the average R]] (X) and variation in R]] (SD) are for 
100 random subsamples. The obse/ved range in R]] is described by the MIN and MAX values which 
Illustrate the extent to which observed Interrater agreement differed from that of the parent 

mc « 12 



y 

^population for a. typical operational sample of 10 to 100 raters. The relationship between the 
stability of Rn (SD of Rn)' and sample size . Is graphically suwwtrlzed by the curves .through 
the <iata points In Figure 2. Both Table 3 and Figure 2 demonstrate that, for corresponding 
saip'le sizes, the variation In R]]' for the AFSC 672X2 raters -Is greater than that for the AFSC 
304X4 raters w1thx stabilization "Of R]] ^SD » .02) occurring at- n - 100 and n «^ 50,' 
respectively. With respect to establishing a suitable sample size for REXALL analysls.V^both 
specialties are sufficiently stable- at th(B 50- to 60-.rater size to permit extraction of the CRP 
{If present). 'For sample sizes much below" 50 raters,* the problem of sampling error, as a cause 
of poor Interratsir agreement. Is more significant. 



Table 1 .' Training E*t)hi$1s Data Samples Analyzed 
with All Raters Included 



Number 



Af.SC 


Title 


Source 

o 


Raters 


Divergents 


Rll 




404X0 


Precls-ldn Imagery and Audio- 
Visual Media Maintenance. 


USAFOMC 


47 




.09 


.73 


811X0 


Security Specialist 


USAFOMC 


120 


23 


.15 


,95 


32.8XX 


Avionics Communications/ 
Navigation Systems 


USAFOMC 


T48 


34 


.12 - 


.95 


328X0 


Avionic Communications 
Systems 


USAFOMC 


65 


11 


.41 


• 98 


328X1 


Anionic Navigation Systems 


USAFOMC 


83 


7 


-.27 


.97 


672X2 


Disbursement Accounting 


AFHRL 


149 


20 


.26 


*98 



304X0 Ground Radio Communication^. 
Equipment « 



AFHRL 335 



48 



.17 



.98 



Note. R]] and Rj^j^ values are for the total simple 
Includes the number of divergents {r<'.30) shown. 



{Number Raters)', which 



Table 2 , Training Eaphatls Data Analyztd With Stqutntlal Rtttoval of 

Divtrgtnt Rattrs 



AFSC 


After 


First 


Set of 


Deletions 


- After 


Second Set of 


Deletions 


Total % 
Deleted 


Nuaber 




Rn 




Nuaber 


Rn 


' Rll 


Raters 


Divergent 


Raters 


Divergent 


404X0 


35 


1 




.13 


.84 


34 


2 


.14 


• 85 


28 • 


811X0 


, • "-^ 


2 




.20 


. .96 


95 


0 


.21 


.96 


/ ■ ■ 

.21 


328XX 


114 


3 




V.14 


.95 


111 


2 


.15 


.95 


25 


328X0 


54 


0 




.55 


.99 ' 










17 


328X1 




2 




.29 


.97 


7 A 


0 


.32 


.97 


11 


672X2 


129 


2 




. .36 


.99 


127 


0 


.37 


.99 




304X4 ' 


237 


4 




.20 


.99 ' 


283 


6 


.20 


.99 


16 



Note . R]] and R|,|^ are for the Nuaber of Raters, which Includes the nunber of newly 
identified divergent raterjs (r <.30) shown. 



Table 3 . , Variation In R]] with Saapit Slz^ 





R]] for AFSC 672X2 






for AFSC 


304X4 




Sample 
Size 


X 


P 


f 

MIN 


MAX 


Y 


SD 




MAX 


10 


.238 


.112 


.017 


.517 


.156 


.061 


.025 


.205 


SO 


.257 


.033 


.144 


.335 


.167 


.020 


.119 


.214 


100 


.259 


.021 

4. 


.211 


.308 


.16$ 


.012 


.132 


.196 




-If- 129 


'>^^]]-.2596 




N-287 




.1686 





Jlote. Data elements (X. SD. MIN* MAX) are for 100 ra^^donly 
drawn -lamp les for each sample size. ^ 




7 





.12 




.11 




.10 




.09 




• 08 


r-i 




r-H 
OC 


• u/ 


lu 




O 


.06 


o 

1— I 








>• 

Ui 




o 


.04 


o 




a: 




<c 
o 


.03 


< 








CO 


.02 




.01 




AFSC 304X4 



10 20 30 40 50 60 70 80 90 100 
SAMPLE SIZE 

Figure 2. Sttbllity of Rn vtrtus saaplt stir. 
Detytlna Multiple R«t1na PollcH* 



ERIC 



Modified IEXAH Anilyiis 

Given th.t REXALLis specifically designed to ev.lu.te r.ter performance with respect to . 
Single r.ting policy, e.ploying it .s .tool to .s«1st with the identification of -ultiple r. ng 
policies within . single d.t. set requires th.t r.ter subgroups representing pot.nti.l r.ting 
policies be so-ehow preselected. Modified REXALL .n.lysis involved two different -ethods for 
predefining potential r.ting policy groups. ' 

First, the poss.1b1lity that a co-plex rating data set -ight be co-prised of one do-inant 
policy and a s-aller -inor policy was .investig,ated by iter.tively applying REXALL; i.e.. by 
re-oving the raters having a relatively high correlation with the sa.ple Man vector fro- t e 
original set of rater^ and running REXALL on the two resulting, set. of raters * 
po cies and assorted "^Jivergent raters have been identified. This, approach assu.es tna. h 
sa.ple .ean vec^-1s.driv.n by the dominant policy rater, and requires an arbitrary criterion 
correlation point to establi.h potential rating policy group -e-bership. Tables ♦ and 5 conta n 
rdistribu'tion and per^^^-^cTurrence of rater correlation, produced by the respective sa.pl. 
iein vector,. A criterion -^rrelation point of .30 to divide raters led to do.inant policy 



11 



15 



•0 



results as produced by the Existing procedure for extrarctlng the common rating policy (see Table 
2). ' REXALL analysis of the potential minor policy groups resulted In very poor Interrarter 
agreement for all samples, Adjustmeht of the criterion correlation point to .40 produced very 
stable dominant policies for all specialties except AFSCs 404X0 and 328XX. All potential minor 
policy groups displayed very poor ^nterratir agreement. Considering the arbitrary nature of the 
criterion correlation po1nt» and the questionably assumption that similar rater correlations 
equate to similar ratljig^ patU/fis^ the results fo/ all samples were inconclusive with respect to 
confirming the presence op^bsence of »the dominant/minor policy condition. In general, this 
method was a poor one for dealing with complex specialties. 



Table 4. Frequency of Occurrence of Rater Correlations 
(Pearson Product-Moment Correlations) 



Number of Raters Correlating with the Mean (Interval) 



AFSC 


No of 
Raters 


Rii 




1.0- 
.90 


.89- 
.80 


.79- 
.70 


.69- 
.60 


.59- 

,50 


.49- 

;4o 


.39- 
.30 


.29- 
.20 


404X0 


4f 


.09 


.73 








\ 5 


10 


9 


11 


12 








r 


















811X0^ 


120 


.15 


.95 






3 


20 


30 


29 


15 


23 


328XX 


148 ' 


.12 


.95 








2 


^2 


4 


46 


34 


328X0 


65 




.98 


2 


21 


19 


7 


4 


0 


1 


11 


328X1 


83 


.27 


.47 




4 


20 


21 


9 


15 


7 


7 


672X2 


149 


.26 


' .98 




3 Ox, 


32 


19 ^ 


18 


19 


11 


20 


304X4 


335 ^ 


.17 


.98 






15 


58 


93 


76 


35 


48 . 



, * ^ ^==V- — 

Note: The ranges are for Pearson Product-Moment Correlation Coefficients {r\ 

between individual raters and^the mean rating. For example, for AFSC 404X0 there are 

five raters who correlate less than .7 but greater thkn or equal to «6iw1th the total 

sample task mean vector. ^ 



(i. second modified REXALL analysis method Involved the analyses of potential rating policy 



groups comprised of raters with common background Variables such as duty title, major command.^ / 
and specialty code. Previously recorded high levels of ^nterrtter agreemettt for the two separatt^^ 
speclixltles, AFSC 328X0 and AFSC 328X1, drawn from the AFSC. 328XX duaNladder sample, constitute/ 
the only interpretable success for this method. The inconsistency pf results for all other 
samples rendered this approach unsuitable. 



ERJC 



Table 5. Ptrctnttgt of Occurrtnct of Rtttr CorrtUtlons 
~ (Pttrson Product-Nonent Corrtlttlont) ^ 



Ptrctntagt of Rattrs 



f 

AFSC 


rio OT 
Raters 








Doubtful 


Divtrgtnt 


404X0 


47 


' f 

.09 


.73 






OA 


811X0 


120. 


.16 

* 


.95 




13 


19 


328XX ' 


.148 


.12 


.95 


46 


31 


23 


328X0 


65 


.41 


.98 


J81 


2 


17 


328X1 


83 


.27 


.47 


84 


"8 


8 


672X2 


149 


.26 ' 


.98 


60 


7 


13 


304X4 f 


335 


.17 


.98 


76 


10 





Note: Percentage distribution of all REXALL rater correlations with 
respect to' three categories^ good raters {£^>.4); doubtful raters 1j 



(.4>r>.3); and divergent raters (r<.3) 



Cluster Analysis 

The CODAP clustering programs were applied to the samples In an atttmpt to develop new 
procedures and guidelines for using and^ interpreting existing clustering software with task 
fact or — diUr. — App^nt^fx A p r o vide — a -d^y c r Ip t Ion vf th e c h T s te rln g p rcgrayyr ttrB~ stitlnrtTF 
measure (percent tralrfing emphasis In common), and auxl^ary CODAP programs used to interpret the 
clusterings. For all samples* the percent-tra1n1ng-emphasis~oVer1ap algorithm aggregated the 
raters, who were very homogeneous with respect to the number and type (by duty) of tasks rated. 
REXALL analysis of these main rater groups produced significantly higher values of R]| and 
higher Individual rater correlations with their ^esgective group tuk mtan vectors than^were 
observed with the parent sample. This Indicated that thpse raters who have high overlap with one 
another on the ratings of tasks they choose to recommend for training display a high level of 
overall Interrater agreement. Merging of these groups resulted in rater clusters with reduced 
levels of Interrater agreement. ^ 

Group ratfng policies differed to varying degrees in their rank-ordering of tasks* Within- 
each sample^ the strongest differences (r^ < .50) occurred between grbiups rating virtually all 
or many tasks across all duties and those rating few tasks acrosi duties j&f ret1r»g tasks confined 
to very few duties. These, rating policy groups were minor in number and size and represent 
raters with extreme training recommendations. Less prominent policy differences (jjPr.SO) 
occurred between groups ratting closer to the sample average number of tasks rated. Raters in 
these groups constituted the bulk of each sample and tended to .emphasize much the same technical 
duties which contained a large, conjmon core of high-training-priority tasks. 



uc 



13 



17 



The dual-specialty AFSC 324XX sample and the small AFSC 404X0 sample clusterings.* exhibited 
Individual differences not observed In the other clusterings. For the AFSC 328XX sample, 89JC of-* 
raters clustered ,1nto two single-specialty groups: AFSC 328X'0 ^r AFSC ^:^*8X], Within each 
single-specialty group, rating policy correlations are highly positive (rjV.SO), Across 
specialty groups, rating pol Icy correlations are negative. The AFSC 404X0 clustering produced 
three small rater groups which account foi* only 63JC of the sample. All three group rating 
policies demonstrate significant'^ differences highlighted by -'Very low between-ratlng-pol Icy 
rank-order correlations (r^ < .50); Ungrouped raters {27%) were regarded as heterogeneous, 
isolate raters, \ 

A valuable feature the CODAP systfem Is the capability to process rater background 



Information. The CODA? DUVARS, PRTDIS, and 
sequence were foiind to be useful aids for 



PRTVAR rater data summaries In clustering (KPATH) 
Interpreting observed REXALL Interrater rel 1ab*1 1 Ity 
statistics and rat"^ correlations. The pfclVAR program can be utilized to summarize rat«*r 
biographies in the KPATH clustering sequefrice to deterrMne the ^extent of shared background 
characteristics withir rater group's* For a Ti ^ slnqlp^&fi ^^mEy^ samplgs^ rater/ characteristics, 
such as grade, major command, primary and duty speclaTty, and job, title/work station (available 
only for AFSCs' 672^2 and 304X4), could not be discerned to have any ^ebvTous connection with 
cluster -groups. Application of discriminant analysis^ to establish the extent to which background 
variables predict clusl^er group membership failed to detect any mean1ngful^assoc1at1ons. In the 
case of the dua 1-spec la I ty (AFSC 328XX), raters clearly clustered Into primary duty rating policy 
groups; 1»e., either AFSC^328X0 or AFSC 326X1. 

In summaihy," the CODAP clustering of training emphasis ratings pro<luced cluster structure's 
comprised of a number of rater groups with rating pol Icy Miff erences which were mainly a function 
of variation in the number^nd type of tasks and duties raters chose to recommerfd for • training. 
However, four limitations are -seen as major obstacles to accepting the training emphasis cluster 
structures as a generally suitable method for identifying multiple rating policies. First, the 
' adjustm^ent of . ratings to a percentage of a rater's total r'atinr sum results in the loss of 
important information about the level, (magnitude) of assigned ratings. Second, the overall 
clustering Is strongly driven by overlap over all non-zero-ratPd tasks, which detracts from 
common tiuty emphasis. Third, subjective decisions are- required to determ/fne the cluster group 
boundaries. Last, the status of the considerably number of isolate raters (5?; to 20t) Is an 
unknown. Becau s e of thirst li m i ta tions, — the clustering of/ trrtining eTTiphasIs ratings is regarded 
.as generating a ratfer sequence Incorporating rater subsets which are useful only as a meaningful 
summary of rater characteristics and not representa^tl ve of multiple rating policies. 

Sinde a» CODAP approach. If , successful, would offer many operating conveniences, five 
additional approacFies were tested for making use of the clustering programs. These techniques, 
which 'were based on assumptions not reported here, involvext different treatments of the raw data 
prior Jto input to the CODAit clustering programs. The five data treatments were as follows: (a) 
direct input of the raw ratings to the OVRLAP program, bypassing the usual INPSTD p*>rcentage 
conversion described in Appendix A; (b) conVersion of all non-zero ratinqs to values o-^ l^ with 
all zeros left zero: (c) conversion of all non-^e-'o latings to values with all zero ratings 

Ignored in the clustering programs; (d) conversion of all ratings by adding 1, prodi'^ Ino a 1 to 
10 rating scale, with no zeros in the analysis; and (e) a conversion designed ^to give higher 
\ weight to the higher raw ratings. In this last conversion all original nofv-^ero ratings withi a 
\alue of X were transformed to 2^"^, and alT zeros ignored fn the clustering. In every casi 
tr^ese similarity measures generated much the same ^^ustering group structure as the perce -*: 
trailing emphasis clustering^. The CODAP clustering approach was consequently discarded as a 
suitab'le analysis technique for identifying multiple rating policies. 

\ - . . ' 

Factor Ana^sls i 

A Q-type\princ1pal components factor analysis (MAX-FACTOR program) with a rater by rater 
correlation mat^rix input (TRICOR program using ratings on a 0-9 scale) was applied to each 

14 

18 - . . 



training emphasis sample. With Nthls approach, raters werev treated as variables loading on 
factors (dimensions of comroon variance) which were interpreted^ as potential rating policies. The^ 
customary criterion factor loading of .3? (app*roxiRately IDt of a rater's variance accounted 
for) was taken as the wlniiiupi absolute value for meaningful rater contribution to a factor rating 
policy. Each factor Vating policy was defined *by examining thee pattern 'of rater l-oadings in 
relation to cpnsideritlons su^h^as i;ater background characteristics, piy^cent training emphasis 
per du|y, allocation or the rank-ordered"* task m^ns for a factor' rating policy group. The^ 
relative ^strength- of rating policies was determined b^ comparing their respective common 
variances \^s proportions of total variance accounted for (%N). 

In contrast to cluster analysis, where rating po 1 icies^., arli characteristic of rater' groups 
with mutually exclusive membership, factor analysis generates rating policies that are external 
to "the rater sef by determining each rater's loa<iing on each rating, policy extracted. Th^s 
permits evaluation of rater performance across all policies. »A further feature of this approach- 
Is the capability to control the number of rating policies for analysis. Initially, the extent 
fo whlci^ « single general factor common rating policy prevails was Investigated. By employing a 
VARIMAX rotation/factor building meW<Jblogy. the relative utility of factor solutions consisjyng 
o> iteratively Increasing numbers of racing policies was evaluated la order to establ ish ^^the 
multiple rating policy s.tructure which best characterizes the sample and also to establish the 
rel^ationship between that structure and thf» CRP. ' ^ 

General factor sojutionV The general factor extracted in a^ one-factor solution accounts for 
the greatest amount jof shared variance^ within the data and 1s conceptualized as^lie CRP 
underlying the total rater set. Analysis of the pattern of rater loadings on this /.actor 
establishes the extent to which the CRP exists within the sample. All single-specialty 'samples 
were found to have a factor CRP characterized by all significant loadings being unidirectional 
and by an a'dcfeptable level of rater agreement. Except for AFSC 404X0. the common, rating policy 
accounted for the majority of raters. ^ In' contrast,' phjt dual-specialty AFSC 32aXX general :»factor 
was comprised of bipolar significant loadings indicative of. two . str'bng 'specialty^spet if ic rating 
policies and preclusive of a CRT> as the dominant policy for the total ^mple. Statistics and 
details for this factor CRP for the single-specialty samples are presented in Table 6. 

Table 6, Analysis Results for the General Factor (CRP) ; 
for Each Specialty 







Number^ 




% Total 




1 


AfSC 


Raters 


Diverg.ents 


Variance 


Hi 




404X0 


22 


25 


(53%)^ 


17.6 


.22 \ 


.86 


611X0 


93 


27 


(23%) 


23.5 


^ .22 ' 


.96 


328X6 


, 54 


11 


(17%) 


52.1 ' 


.54 


.99 


328X1 


74 


'9 


(11%) 


37.8 


.32 


.97 


672X2 


125 


24 


(*16%) . 


40.5 


.38 


.99 


3Q4X4 


276 


59 


(18%) 


26.5 


.21 


.99 



^Number of Raters equates to number of loadings greater 
than criterion minimum of .33 (11% of variance). 

^Parentheses contain number of divergents as percentage of 
total sample. ^ 

19 / - ■ . 



A detailed analysis of the hlghrlow rater loading sequence on the single-specialty gener'al 
factors -confirmed the notion that this /actor represents the' dominant thewe which llnk^^the 
Majority of raters within the single-specialty sawples. Iterative renoval of raters frow Jthe .low 
loading end of the rank-ordered general factor loading sequence resulted In $ steady Increase In 
'R]'] and R|^|^ despite decreasing sample size,* This continual liprovenent pf Interrater 
ffellablllty Is a function of the systematic " reduction of error varla^nce afid establlshe's the 
genera] fact5,n-^oad1ng sequence as an accurate distribution of^ ratei^ perfomance with respect to 
the CRP. ' - 

Conparlson of the REXALL hIgh-low rater qorrefatlon sequence (as produced by the sample task 
mean vectpr) with the corresponding genieral factor hlgh^low rater loading sequence 'for each 
single specialty revealed a close^ matching In rater rank-ordBrs and correlation/loading valuei 
which ten'ded to virtual equivalence with Increasing total sample R]], Corresponding factor CRP 
and REXALL analysis results are presented in T&ble 7. Except for y^FSC 404X0, the CRP extraction 
criteria for both analysis * procedures Identified similar or Identical divergent rater sets. 
Minor differences' are due to the retention of a few REXALL doubtful raters ( .30 < r < .4'^0) the 
Inclusion (of usion) of whom can be demonstrated to generate negligible perturbations In the 
rating policy ^sk mean rank-order. For these fIVe ^single-specialty samples, the^ REXALL grand 
task^mean vector performed adequately '^as a standard*- for determining the relative worth of all 
raters with respect to^the CRP. Large discfapancles . between the factor and REXALL analyses 
statistics for AFSC 404^0 were caused by the relatively large number of divergent raters (53%) 
who did not Identify stignlflcantly with the specialty CRP. Consequently, the sample task mean 
^vector produced a REXALL rater correlation- sequence which jflld not reflect the ^relative worth of 
raters with respect to the CRP. For this typ? of complex sample, routing, RE)?ALL analysis 
procedures are Inappropriate. * , ' ^ . ^ 

Table 7. Comparison of General Factor (CRP) and Second Iteration 

~™ Deletion Statistics for Each Specialty ^ '^"^ 





Number of 


Raters 


Rii 




« ' . "kk 




% Deleted 




Factor 


REXALL 


Factor 


REXALL 


Factor 


REXALL Factor 


REXALL 


404X0 , 


22 


34 


.22 


.14 


.86 


•»> 

.85 


53 


28 


811X0 \ 


93 


95 


.22 


.21 


.96 ^ 


.96 


23 


21 


\ 

328X0 . \ 

\ 


54 


54 


. -54 


.54 , 


.99 


r99 


17 


17 


J28XT 


\7r- 


74 


" .12 


.32 


797 


*97 


11 


11" ' 


'672X2 


k 


127 


.38 


.37 


• 99 


.99 


16 


15/ 


304X4 


27^ '"^ 


283 


.21 


.20 . 


.99 


^9 


18 


16 


Note: 


R]] ,and 


Rkk 


for Number 


^Qf Raters 


surviving 


deletion; 


1,e. 


, general 



fictor CRP coBprlsed of raters with loadings 2:.33 and REXALL results for raters with 
correlations >:.30 after two rfeletlon passes. 



id 

ERIC 



2,0 



16 



Although f^gtor analysis was intended prliarlj^ to d^sal with the Identification of multiple 
rating poTlcles. the Information conveyed b^'^ the one-factor^ solution, together with the 
factor/REXALL analyses comparisons, permits modification of th* -original REXALl CRP extraction 
criteria described In Section I of this repx)rt. In general terms, these findings demonstrate 
that for single-specialty samples, the reliable CRP Is derived via REXALL analysle when a Jevel 
of R]| >{.2p and Rkk>*'^ attained by the successive deletion of sets- of divergent raters 
(r < .30). providing R]] Increases with each deletion pass and no more than 25t to 30% of the 
sTmple Is dereted. Allowing for the deletion of this maximum number of divergent raters and 
taking Into iiQCJunt the R]] stability/sample size findings. It was found that a minimum sample . 
s,1ze of 55 raters was required to attain minimum acceptable Interrater agreement. For smaller 
samples dictated by rater availability. R]i >r.20 and Rj^^ > .«0 would be acceptable. 

Ro tated factor solutions . The YARIMAX rotation redistributes rater variance In an attempt to 
IsolTte the number of discrete rating policies that best characterizes the data In a meaningful 
training sense. Theoretically, a principal components analysls^requlres as many factors (rating 
policies) as there are variables (raters). The analysis produces them 1n order of decreasing 
proportions -of total variance accounted^for,. However, It Is obvious that the number of useful 
policies must be considerably less than the number of raters. The factor-building approach, 
whereby an Iteratlvely Increasing number of factors are extraq^ted and rotated, starting with the 
two-factor solution. Is based on the bemf that. If significant multiple* rating policies with 
potential training application exist, they should be represented by those initial factors which 
account for a high percentage of the total varlimce (%H) after rotation. Ideally, these factor 
rater groups would (a) display mutually excluslve^embershlp, (b) account for most raters (with 
loadings greater than the criterion minimum of .33), and (c) espouse significantly different 
rating policies (r^s < -50). • More specifically, the analysts is truncated at that optimal 
utility point beyond which factors are dropped for interpretive^^purposes because they (a) con«1st 
of few or no significant loadings, (b) account for ' relatively small amounts of variance, (c) 
provide no further gains with respect to- increasing the mutual exclusive membership of prior main 
factors, and (d) demonstrate no potential training application. ^ 

Application of the VARIHAX rotat1'on/f actor-b'ui Idlng technique to all .samples ideqtified 
different rating policies {Vj < .50) in two Instances: the complex single specialty, AFSC 
404X0,* and tfie dual-specialty sample, AFSC 328XX. For aM other samples, the rotated solution 
analyses reinforced t1ie CRP as the dominant rating policy by identifying two or. three main 
Internal rating themes as minor Variations of the CRP. . , - - 

The threerfactor^'solutlon for AFSC 404X0 appeared to be optimal. ^Factor group membership was 
mutually exclusive and accounted for 80« of the sample. Divergent raters who were not accounted 
for did not share significant variance beyond the three-factor solution. Statistics f^r^the 
single- and three-factor solut lonj^ toget her with detail s for the a ssociated rating pol ici es, ^ar e ^ 
' provided 1n TabTe~8r~T^ coefficients" (Spearman's>s) among the three factors 

(3Fr, 3F2/and 3F3) were low: 3Fl/3F2 had « .103, 3F1/3F3 had - .074, and 3F2/3F3 had 
- .305r''^^ese values indicate significant high-priority task/duty differences (set Table 
J). The rate'rNDolicy groups were identified by the predominant duties they performed: (a) 
photographic proc^^ing and support equipment, Cb) camera and audiovisual maintenance, and (c) 
camera ma1ntenance/"*tn summary, the ^AFSC 404X0 sample is comprised of three discrete and 
significantly different rating policies^ one of which dupli^cates a very weak CRP. When combined, 
these competing multiple policies render the total sample complex and unsuitable for REXALL 
analysis. Details of the three-factor solution for AFSC 404X0 are given in Table 9. 



ERIC 



9 ' ■ 17 



21 



Jable 8, General and Rated Factor Statistics for AFSC 404X0 



No. of Hlgh- 
Prlorlty Not of High Priority Tasks fay Duty 
Solution Group Raters Var.lance R|^|^ Tasks E F G'H 10 K L M 



^ Fjictor Bfo. of' % Total 



General 

Factor CRP 



22 



17.6 



.22 



.86 



139 



11 35 56 24 



0 ' 0 13 



3F1 



16 



16.8 



.32 .91 



148 



11 34 64 29 



0 10 



Rotated 
Factors 



3F2 



13 



10.9 



.22 .78 



130 



6 41 16 15 20 



3F3 



9.7 



.03 .23 



40 



1 



20 8 



Notes : Factor group membership Is determined by the /number of loadings greater than or 
equal to the criterion minimum of .33. Group rating policies are described In terms of duty 
emphases associated with high training priority tasks Identified by the FACPRT program. 
Htgh-prlor Ity tasks are defined as those tasks with a mean rating greater than or equal to one 
standard deviation above the mean of task means. The frequency distributions of rating policy 
task means revealed that, complementary to their respective high-priority tasks* GRP 3F1 and GRP 
3F2 assign zero-to-low training emphasis to approximately ^QQ% of all tasks whereas GRP 3F3 
allocates an average to above-average training emphasis to 95P31 of all tasks. ; 



Table 9. Rotated Factor Solution for AFSC 404X0 



Factor 
Group 



Number 
Raters 



^11 



^kk 



Rating Policy 



3F1 
3F2 
3P3 



16 
13 
9 



.32 
.03 



.91 
.78 
.23 



Photograph',c Processing and Support Equipment 



Camera and Audiovisual Maintenance 



Camera Maintenance 




Details for the optimal three^factor solution for the dual-specialty ArSC 328XX sample are 
presented In Table 10. The t*ro main factor ^groups, 3F1 and 3F2, d^^abllsh two uniquely different 
specialty-specific rating policies virtual ly Identical to those extracted yla the separate 
analysis of the two component specialties. Group 3F3 consists of raters^ who« by rating across 
all duties, formulate a minor CRP for the total sample. The mutual exclusivity of factor grjoup 
membership and the low rank-order correlations between the rating policies they represent, render 
the* total sample complex and unsuitable for REXALL analysis^. The r^ values for the compaHsons 
were 3F1/3F2, rj • -.344; 3F1/3F3, Tj - -^088$ and 3F2/3F3, rj » .482. 

The rotated solutions f or . the remaining five single-specialty samples share common features 
which disqualify the component factors as meaningful multiple rating po Holes.* Each sample Is 



18 

22 



0 

comprised of rating policies that are minor variations In the CRP. .This Is evidenced by (a) higfr 
Inter-polliy rank-order correlations, > .50, (b) rank-order correlations with the CRP In the 
range of .70 to .99, (c) non-mutually exclusive membership, (d) high training priority tasks 
which are largely accounted for by the CRP high training priority tasks, and (e) "rater 
membej'shlps which are subsets of the CRP membership. These five single specialties 'are 
appropriately classified as simple or non-complex In that the REXALL CRP reliably subsumes the 
competing component rating policies. 

Table 10. Rotated Factor Solution for AFSC 328XX 



Factor 
Group ^ 


Number 
Raters 


RlT 


Rll 








Ratjjf^^' Policy 

— - — — -j^ 




3F1 


54 


, .56 


.99 


AFSC 


328X0 


CRP 


(Incl. one 328X1) 


« 


3F2 


71 


.33 


.97 


AFSC 


328X1 


CRP 


(Incl. two 328X0) 




3F3 


16 


.28 


.86 


AFSC 


326XX 


CRP 


(eleven 328X1 and five 328X0) 





III. APPLICATIONS . ^ ^ 

1. REXALL analysis Incorporating the new CITp extraction ciFlterIa js appropriate for 
establishing the overall rec<)5!ffl|nded training priority for a stngle-speclalty sample. The REXALL 
configuration of a sl^ngle-speclalty sample likely to contain a sellable fRP Is one with the 
following characteristics: 

a. $"lngle-rater reflablllty, R]] > .15. 

b. J Approximately 65t (or more) of raters with correlations, r_ > .40. Vy- 

c. Some rater correlations, > .70.* . - 

2. REXALL rater correlation guidelines for regaining or rejecting raters as being reliable 
or divergent with respect to the CRP are as foriows: 

a. If r > .40, reliable rater; retain. * > 

b. If .30 < r <,.40, doubtful riter; analyze rating pattern before retaining or 
rejecting. 



c. 



If r < .30 and/or t-value < 3.0; divergent rater; reject. 



, '3. Rating pattern analysis to support the retention or rejection of doubtful raters^ 
c(gs1$ts of evaluating the extent to which the following individual rater characterlitlci diverge 
fri« the majority rating pattern: < ' . 

a. Total number of non-zero responses. 

b. Mean rating and standard deviation on the ) to 9 scale. 



ErJc ^ . . " .23 



c. Distribution of non-zero ratings on the 1 to'\9 scale, " . 

4. Distribution of non-zero ratings across duty areas, ' ' ■ 

e.^ Distribution of percentage training emphasis across duty areajr. 

These rater characteristics are available fro« the CODAP PRTDIS (for 3a^, 3b^ and 3£) and DUYARS 
(for 3d and 3e) programs. Rater sequencing can be in normal numeric input order or KPATH order. 
The latter sequence, which requfres additional computing via ■ the CODAP clustering programs 
•( OVERLAP, GROUP, and KPATH), separates the rater sample into subgroups of raters with highly 
similar rating, patterns and isolates raters with diverging rating patterns. 

4. Applications of these criteria and guidelines would ensure extraction of a reliable CRP 
(If It exists) with a single-rater, reliability "R]] ,20. The interrater reliability for the 
final set of CRP raters (^^^) will depend on the number of good raters surviving deletion. To 
maximizt attainment of R|^|^ > -90, a minimum safe sample size of N = 55 is desirable* For . 
smaller samples, a^ Rj^|^ > ,80 is acceptable, 

5. Principal components factor analysis is appropriate for the analysis of complex single 
specialties which fall to attain acceptable interrater agreement with REXALL analysis, using the 
new CRP extraction criteria and for multi-ladder survey data with a hl^gh potential for 
specialty-aligned multiple rating policies,^ The number and type (unidirectional or bipolar) of 
significant loadings on. the one general factor solution will define the extent to which a CRP 
exists for a sample. Application^ the VARIMAX rotation/factor-building analysis technique will 
determine the extent to which competing multiple rating policies exist within the sample, 

6. In seeking a multiple factor solution, factor extraction and rotation should be stopped 
when the factors identified are found to satisfy the following guidelines: 

a. High proportio^n of total variance accounted for, 

b. Most raters are accounted for (loadings ,33) while remaining divergent raters 
(loadings < .33) are few and not Included within the^ main factor structure, 

c. . Results remain relatively stable upon further extraction. 

d. The policies found appear reasonable, witl^ potential for generating coherent 
training strategies. / 

7. The veracity of a rotated solution reflecting Intended rater training recommendations ^ is 
directly proportional -to the level of single-rater rel/abilfty (R]]) within each policy and to 
the extent that interpretable differentiation exists between factor policy/groups in' terms* of the 
following: ^ • " 

a. Mutually exclusive group membership, ^' » 

b. 'Rank-order correlations (j^j < -^O), ~ 

c. High training priority tasks. . 

d. Common background variables. 



20 



24 



IV. CONCLUSIONS 

]i Factor analyses of the six single-specialty training e«phas1s samples In this report; 
although uncovering more than one rating, polity 1n each case, have demonstrated them to be less 
^complex' than anticipated. For five of these specialties, there was no practical difference 
(rj .50) betwetei) the ratlng^^Hcles. 

2. ■ REXALL ^analysis employing the new CRP extraction criteria 1s adequate for CRP Including 
all rVters (Ideal) and for CRP with divergency less'^than ZS% (e.g., AFSCs 328X0, 328X1,^ 811X0, 
672X2 and 304X4). v.i 

3. REXALL analysis Is'^lnadequate for the following sample types: (a) two or more competing 
rating policies (e.g., AFSC 404X0), (b) no main policies, and (c) multi-ladder surveys (e.g., 
AFSC 328XX). 

4. ^ Modified REXALL analysis and^ CQDAP cluster analysis (normal or , experimental types) are 
not adequate for Identifying multiple rating policies. 

f . ' 

5. The CODAP auxiliary s^ummary programs (DU)/ARS, PRTDIS, PRTVAR, and FACPRT) have high 
utility for Interpretation of REXALL and factor analyses. 

6. Principal components factor analysis has a high utility for Identifying the CRP and 
multiple rating policies. . 



REFERENCES 

Chrlstal, R. E.; &' Welssmul 1 er , J. J. (1976). New CODAP programs for analyzing task factor ^ 

Information (AFKRL-TR-76-3» ; AD-A026 121). Lackland -AFB, TX: Occupational and Manpower 
Research Division, Air Force Human Resources Laboratory. 

Goody, K. ( 1976). Comprehensive occupatloi^a^l data analysis programs (CODAP); Use of REXALL to 
I dentify divergent rgiters^ (AFHRL-TR-76-82, AD-A034 327). Lackland AF8, TX: Occupational and 
Manpower Research Division, Air Force Human Resourcies Laboratory. 

Lindqulst, E. F. (1953). Design and analysis of experiments In psychology and education ^ Boston: 
Houghton Mifflin. 

Ruck3 H. W., Thompson, N. A., Brown, R. H., & Stacy, W. J* (In preparation). Development of a 

task training emphasis scale^ and training priority equations . Manpower and Persor^el 
Division, Air Force Human Resources Laboratory. 

^ ' ■ ■ V 

\ 

"NT . / 



25 

21 



APPENDIX A : CODAP CLUSTERING DESCRIPTION 

The main clustering prograns are INPSTD. OYRLAP. GROUP, and DIAGRM. mitlally INPSTD adjusts 
«ach rater's task ratings (0 to 9 scale) to a percentage of the sum of that raters yalnlng 
eHiphasIs ratings, %TE. This adjustment standardizes all raters to a coiiKon wean of 10(J/NTASK.. 
(NTASK is the total number of tasks -In the Inventory.) The OVRLAP program establishes a 
rater-by-rater similarity matrix using percent training' emphasis In common (sum of linear overlap 
on corresponding tasks) as the measure of similarity. This matrix Is collapsed by the GROUP 
program to form groups of raters with similar rating patterj/is. Each pair of raters or rater 
groups which mert€>4ur1ng the grouping Is given a contiguous block of- (K^TH) sequence numbers. 
The hierarchical relationship between raters/groups can be graphically displayed via the OIAGRM 
pfogram. A valuable CODAP feature is the set of auxiliary programs that can be utilized to 
report rater and group data summaries. Raters' training emphases. In terms of number of tasks 
rated (non-zero) per duty category and percentage of training emphasis per duty, are summarized • 
in the DUVARS program printout. Rating ^patterns ^are summarized in the PRTDIS program printout 
which details each rater's performance on the 1 to 9 scale in terms of total number of tasks 
rated and mean, standard deviation and distribution of ratings. These summaries are especially 
relevant to group structure considerations when raters are listed in KPATH seqiience. Analysis of 
the .PRTVAR program output allows determination of the, extent to which biogryphical and computed 
variables are shared by rater groups. For any selected cluster group, ^ the JOBGRP program 
computes the percent training emphasis per duty summary as a general des^crlption of the group 
rating policy. Task-level differences between group rating policies ctn be hIghTlghted by the 
comparison of task means across groups using the FACPRT program. Rank-^der correlations between 
group task Piean vectors, using the FACCOR program, test for rating policy differences. 



T^U.S. <aOVERNMENT PRINTING OFFICE: 1^85 569 05 3. 20011* 



26 



